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Abstract 

Systems exploiting network coding to increase their throughput suffer greatly from pollution attacks 
which consist of injecting malicious packets in the network. The pollution attacks are amplified by the 
network coding process, resulting in a greater damage than under traditional routing. In this paper, 
we address this issue by designing an unconditionally secure authentication code suitable for multicast 
' network coding. The proposed scheme is robust against pollution attacks from outsiders, as well as 

coalitions of malicious insiders. Intermediate nodes can verify the integrity and origin of the packets 
received without having to decode, and thus detect and discard the malicious messages in-transit that 
fail the verification. This way, the pollution is canceled out before reaching the destinations. We analyze 
the performance of the scheme in terms of both multicast throughput and goodput, and show the goodput 
' gains. We also discuss applications to file distribution. 

o ■ 

1 Introduction 
> ■ i-i 

■ Network coding was first introduced in [1] as an innovative approach to characterize the rate region of 

I multicast networks. Network coding allows intermediate nodes between the source(s) and the destinations 

not only to store and forward, but also to encode the received packets before forwarding them. In [5], Li 
CO I et. al showed that linear coding suffices to achieve the max- flow from the source to each receiving node 

^\ • in multicast networks, where intermediate nodes generate outgoing packets as linear combinations of their 

' incoming packets. In line with [2], [3] gave an algebraic framework for linear network coding with further 

developments for arbitrary networks and robust networking. For practical issues, [3] proposed a network 
coding framework that allows to deal with random packet loss, change of topology and delays. 

Network coding offers various advantages not only for maximizing the usage of network resources but 
also for robustness to network impairments and packet losses. Various applications of network coding have 
therefore appeared ranging from file download and content distribution in peer-to-peer networks O [6l [7] to 
I distributed file storage systems [21 [TU] . 

While much of the literature on network coding discusses network capacity or throughput, it is also 
natural to wonder about the impact of network coding on network security. Pollution attacks, which consist 
of injecting malicious packets in the network, are for example more dangerous for the systems exploiting 
network coding than for those using traditional routing. Indeed, in this scenario, malicious packets may 
come from the modification of received packets by a malicious intermediate node, or from the creation of 
bogus packets then injected in the network by an outside adversary. With no integrity check performed for 
packets in transit in the network, an honest intermediate node receiving a single malicious packet would 
perform the encoding of the malicious packet with other packets resulting in multiple corrupted outgoing 
packets that are then forwarded on to the next nodes. The corrupted packets propagate then all through 
the network which creates severe damages, amplified by the network coding process. 

*F. Oggier is with Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological 
University, Singapore. Email:frederique@ntu.edu.sg. H. Fathi is with Center for Telelnfrastuktur, Aalborg University, Denmark. 
Email: hf@es.aau.dk. Part of this work was presented in an invited paper at AUerton conference 2008. 
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1.1 Authentication techniques 

One way to address the pollution attack problem is through authentication techniques. Packets in transit 
at the intermediate nodes should be authenticated before being encoded and forwarded, to verify both their 
origin and their content. The goal is to achieve authentication even in presence of both inside and outside 
attackers who can observe the messages flowing through the network and inject selected messages. The 
success of their attacks depends on their ability in sending a message that will be accepted as valid (i.e., 
impersonation attack) or in observing a message and then altering the message content (i.e. substitution 
attack) in such a way that intermediate nodes and destinations cannot detect it. 

Let us recall that authentication consists of the following properties, though we will focus here only on 
the first two: 

• data integrity: protecting the data from any modification by malicious entities, 

• data origin authentication: validating the identity of the origin of the data, 

• non-repudiation: guaranteeing that the origin of the data cannot deny having created and sent data. 

To satisfy these properties, messages at the source are appended either a digital signature, a message 
authentication code (MAC) or an authentication code (also called tag). There exist subtle differences among 
these techniques. First, MAC and authentication codes ensure data integrity and data origin authentication 
while digital signatures provide also non-repudiation. Second, MACs, authentication codes, and digital 
signatures should be differentiated depending on what type of security they achieve: computational security 
(i.e., vulnerable against an attacker that has unlimited computational resources) or unconditional security 
(i.e., robust against an attacker that has unlimited computational resources). MACs are proven to be 
computationally secure while the security of authentication codes is unconditional [llj . Digital signature 
schemes exist for both computational security and unconditional security. However while computationally 
secure digital signatures can be verified by anyone with a public verification algorithm, the unconditionally 
secure digital signatures can only be verified by intended receivers as it is for MACs and authentication codes 

1.2 Related work 

Several authentication schemes have been recently proposed in the literature to detect polluted packets at 
intermediate nodes [H [131 IHl Ull US] ■ AH of them are based on cryptographic functions with computational 
assumptions, as detailed below. 

The scheme in [8J for network-coded content distribution allows intermediate nodes to detect malicious 
packets injected in the network and to alert neighboring nodes when a malicious packet is detected. It uses 
a homomorphic hash function to generate hash values of the encoded blocks of data that are then sent to 
the intermediate nodes and destinations prior to the encoded data. The transmission of these hash values 
is performed over a pre-established secure channel which makes the scheme impractical. The use of hash 
functions makes the scheme fall into the category of computationally secure schemes. 

The signature scheme in |13| is a homomorphic signature scheme based on Weil pairing over elliptic 
curves, while the one proposed in [14] is a homomorphic signature scheme based on RSA. For both schemes, 
intermediate nodes can authenticate the packets in transit without decoding, and generate a verifiable 
signature of the packet that they have just encoded without knowing the signer's secret key. However, these 
schemes require one key pair for each file to be verified, which is not practical either. 

The signature scheme proposed in !15j uses a standard signature scheme based on the hardness of the 
discrete logarithm problem. The blocks of data are considered as vectors spanning a subspace. The signature 
is not performed on vectors containing data blocks, but on vectors orthogonal to all data vectors in the given 
subspace. The signature verification allows to check if the received vector belongs to the data subspace. The 
security of their scheme holds in that no adversary knowing a signature on a given subspace of data vectors 
is able to forge a valid signature for any vector not in this given subspace. This scheme requires also fresh 
keys for every file. 
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Finally, the signature schemes given in [TB] follow the approach given in [TJ] with improvements in terms 
of public key size and per-packet overhead. The signature schemes proposed are designed to authenticate a 
linear subspace formed by the vectors containing data blocks. Signatures on a linear subspace are sufficient 
to authenticate all the vectors in this same subspace. With these schemes, a single public key can be used 
to verify multiple files. 

1.3 Organization and contribution 

In this paper, we propose an unconditionally secure solution that provides multicast network coding with 
robustness against pollution attacks. Our solution allows intermediate nodes and destinations to verify 
the data origin and integrity of the messages received without decoding, and thus to detect and discard 
the malicious messages that fail the verification. It is important to note that destinations must receive a 
sufhcient number of uncorrupted messages to decode and recover the entire file sent by the source. However, 
our solution provides the destinations with the ability to filter out corrupted messages and to have them 
filtered out by intermediate nodes as well. 

Our scheme here aims for unconditional security. We rely on information theoretic strength rather than 
on problems that are thought to be hard as in [3 [HI [151 [E] • Unconditional authentication codes have led 
to the development of multi-receiver authentication codes [TTl [18] that are highly relevant in the context of 
network coding. Multi-receiver authentication codes allow any one of the receivers (in the context of network 
coding, that may be intermediate nodes and destinations) to verify the integrity and origin of a received 
message but require the source to be designated. Our scheme is inspired from the (fc, V) multi-receiver 
authentication code proposed in fl8' that is robust against a coalition of A; — 1 malicious receivers amongst 
V and in which every key can be used to authenticate up to M messages. We define and adapt the use of 
{k, V) multi-receiver authentication codes to network coding so that intermediate nodes can detect malicious 
packets without having to decode. 

Our scheme is adaptive to the specifications of the application in use and the network setting. Its 
efficiency is scenario-dependent. The communication and computational costs are function of parameters 
related to the application in use (i.e., the number M of messages to be authenticated under the same key and 
the length I of the messages) and to the network setting (i.e., the number of colluded malicious adversaries 
A; — 1 to be considered). However for the communication cost, one independent advantage exists over the 
previous schemes. Our scheme is particularly efficient in terms of communication overhead, since contrarily 
to all existing schemes [H [Ml [151 [16] , it requires one single symbol only for tracking purposes. 

We give a multicast goodput analysis to assess the impact of pollution attacks on multicast throughput 
and to show how much goodput gain our scheme offers. We show how our scheme can be used for applications 
such as content and file distribution. 

The rest of the paper is organized as follows. In Section[21 we briefly present the network coding model we 
consider and define what are authentication codes in general and in particular for network coding. Section [3] 
presents the authentication scheme, whose analysis is presented both in Section [H for security, and in Section 
[5] for performance. Section [6] shows how our scheme can be used for content and file distribution. Future 
work is addressed in the conclusion. 

2 A Network Coding Setting for Authentication Codes 

We start by introducing the multicast network coding model we are considering. Since we are not aware of 
prior work on authentication codes for network coding, we then propose a definition of authentication codes 
for multicast network coding. 

2.1 The multicast network coding model 

The model of network we consider is an acyclic graph having unit capacity edges, with a single source S, 
which wants to send a set of n messages to T destinations Di, . . . ,Dt- Messages are seen as sequences 
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of elements of a finite field with q elements, denoted by Fg. Each edge e of the graph carries a symbol 
y{e) £ Fq at a time. For a node of the graph, the symbols on its outgoing edges are linear combinations, 
called local encoding^ of the symbols entering the node through its incoming edges. If the 
symbols to be sent by the source 5 at a time, we have by induction that on any edge e, y(e) is actually a 
linear combination of the source symbols, that is y{e) = "Y^^^i 9ii^)^ij where the coefficients gi{e) describe 
the coding operation. The vector g{e) = [gi(e), . . . ,gn{e)] is thus called the global encoding vector along the 
edge e. We can describe the messages received by a node in the network with h incoming edges ei, . . . , e;i 
by the following matrix equation: 




gi{ei) . . . 5„(ei) \ / xi 



.9i(e/i) ■■• gn(e/i) 



1 


fx, \ 







where G is called a transfer matrix. In particular, the destination nodes Di, i — 1, . . . ,T, can recover the 
source symbols xi, . . . , assuming that their respective transfer matrix Gd- has rank n, i = 1, . . . , T (this 
also means h > n). In this paper, we are not concerned about the existence of global encoding vectors, 
and we thus assume that we deal with a network for which suitable linear encoding vectors exist, so that 
destination nodes are able to decode the received packets correctly. 

We can packetize the symbols y{e) flowing on each edge e into vectors y(e) = [yi{e), . . . ,yAr(e)] G F^, 
and likewise, the source symbols Xi can be grouped as Xi — [x - ^ ^. ..i itr^ 
node with h incoming edges can be rewritten as 



or equivalently 







■ • , Xi^N 


) 






^ x„ / 



] e F^, so that the equation at a 



e F' 



hxN 



(1) 



2^1,2 



Xn,2 



XlM 



Xn,N 



' 1 



where Xi , . . . , x„ are the n messages of length N to be sent by the source. 

Example 1 Consider the small network (taken from .3j ) as shown in Fig. [1] where the source S wants to 
send n = 3 messages Xi, X2, X3 e F^ to T = 1 destination Di, through two nodes R\ and i?2. 
The source computes the vector 

52 (ei) 53(61 
.92(62) .93(62 
.92(63) 53(63) 

as a linear combination of its three messages xi,X2,X3 and sends each y{ei) over the edge e^, i — 1,2,3. 
The node Ri receives y(6i) and y(e2), which it encodes as follows using its global encoding vectors 5(64) = 
(an, 0112) and 5(65) = (a2i,Q!22): 






y(e4) 
y(e5) 



an 

^21 

an 
a2i 



ai2 

^22 

ai2 

^22 



y(ei) 

y(e2) 



5i(ei) 52(61) 53(61) 
51(62) 52(62) 53(62) 




51(64) 52(64) 
51(65) 52(65) 



53(64) 
53(65) 
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Figure 1: A small example of network with one source S, one destination Di and two relay nodes -Ri and 
i?2- Global encoding vectors have coefficients in F2. 



while the node R2 gets 



y(e3) 
y(e4) 





53(ei) 
53(62) 
33(63) 



Denote by (/(ee) = (/3ii,/3i2) and g{ej) = (,t'?2i, /322) the global encoding vectors of node R2 corresponding to 
the edges ee and 67 respectively. Finally, the destination gets 



/ y(6i) \ 
y(62) 
y(63) 

V y(e4) / 

51(61) 
91(62) 
31(63) 

V aii9i(6i) + a!i2S'i(e2) aiiS'2(ei) + ai2S'2(e2) aiiS'3(ei) + 012^3(62) / 




a2i q;22 





a2i q;22 



51(65) 92 
51(65) 92 
51(67) 52 




52(61) 
52(62) 
52(63) 
"1152(61) + "1252(62) 



53(61) 
53(62) 
53(63) 

"1153(61) + "1253 



Xl 

X2 
X3 



.93(65 
53(66 
53(67) 




The destination Di can decode if the global vectors have been chosen such that the transfer matrix G is 
invertible. The global vectors are linear combinations of the local encoding coefficients a^- at Ri and Pij 
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at i?2, i,j = 1,2. There are many configurations over F2 such that G is invertible. Take for example 
51(61) = 52(62) = 53(63) = 1, 51(62) = 52(61) = 53(61) = 51(63) = 52(63) = 53(62) = 0, with /3i2 = /?2i = 
ai2 = o-n = 1 and (5\\ = P22 = ctu = Q12 = (this yields the transfer matrix to be equal to the identity 
matrix) . 

2.2 Authentication codes for network coding 

Since we are not aware of prior work on network coding authentication codes, let us start by recalling the 
setting for classical authentication schemes, as proposed by Desmedt et al. In [T7], the authors proposed a 
model for unconditionally secure authentication where one transmitter communicates to multiple receivers 
who can not all be trusted. In this scenario, the transmitter first appends a tag to a common message which 
is then broadcasted to all the receivers, who can separately verify the authenticity of the tagged message 
using their own private secret key. There is among the receivers a group of malicious receivers, who use their 
secret key and all the previous messages to construct fake messages. A {k, V) multi-receiver authentication 
system refers to a scheme where V receivers are present, among which at most k—1 can cheat. The malicious 
nodes can perform either an impersonation attack, if they try to construct a valid tagged message without 
having seen any transmitted message before, or a substitution attack, if they first listen to at least one tagged 
message before trying to fake a tag in such a way that the receiver will accept the tagged message. Perfect 
protection is obtained if the best chance of success in the attack is 1/|T| where |T| is the size of tag space, 
namely, the attacker cannot do better than make a guess, and pick randomly one tag. 

In 18J, the scheme of Desmedt et al. has been generalized to the case where the same key can be used 
to authenticate up to M messages. 

The network coding scenario that we consider in this paper is a multicast setting, where one source wants 
to send a set of messages to T destinations. In order to propose a definition of network coding authentication 
scheme, let us first understand the main differences with respect to the classical multi-receiver scenario: 

1. The source does not broadcast the same message on all its outgoing edges, but sends different linear 
combinations of the n messages Xi, . . . ,x„, which means that the key used by the source to sign the 
messages will be used more than once, actually at least as many times as there are outgoing edges from 
the source. 

2. We are interested in a more general network scenario, where intermediate nodes play a role. In 
particular, it is relevant in the context of pollution attacks that not only destination nodes but also 
intermediate nodes may check the authenticity of the packets. We call such nodes in the network 
verifying nodes. This set may include part or all of the destination nodes Di, . . . ,Dt- This makes 
a big difference in network coding, since while the destination nodes do have a transfer matrix to 
recover the message sent, this is not the case of regular intermediate nodes, which must perform the 
authentication check without being able a priori to decode. 

Based on the above considerations, we propose the following definition for multicast network coding. 

Definition 1 We call a [k, V, M) network coding authentication code an authentication code for V 
verifying nodes, which is unconditionally secure against either substitution or impersonation attacks done by 
a group of at most k — 1 adversaries, possibly belonging to the verifying nodes, where the source can use the 
same key at most M times. 

3 The Authentication Scheme 

Recall that we have a single source S, which wants to multicast n messages to T destinations Di, . . . , Dt- 
We will denote the set of messages by si, . . . , s„ to refer to the actual data to be sent, while we keep the 
notation Xi, . . . ,x„ £ for the whole packets, including the authentication tag. Each message Si is of 
length Si = (si,i, . . . , Si^i), so that while each symbol Sij belongs to F^, we can see the whole message as 
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part of ~ ¥qi. We also assume a set of nodes i?i, . . . , Ry which can verify the authentication. A priori, 
this set can include the destinations, but can also be larger. Typically we will assume that V >> T, in the 
context of pollution attacks. 

We now present our (/c, V, M) network coding authentication scheme and discuss its efficiency. Security 
will be analyzed in the next section. 



3.1 Set-up and authentication tag generation 

We propose the following authentication scheme: 

1. Key generation: A trusted authority randomly generates M + 1 polynomials Po{x), . . . ,Pm{x) G 
F^i [x] and chooses V distinct values Xi, . . . , xy G F^i . These polynomials are of degree fc — 1, and we 
denote them by 

Pi{x) ^ Oio + anx + ai2X^ + . . . + ai^k-ix''^^ , 

i = 0,...,M. 

2. Key distribution: The trusted authority gives as private key to the source S the M + 1 polynomials 
{Pq{x), . . . , Pm{x)), and as private key for each verifier Ri the M + 1 polynomials evaluated at x = Xi, 
namely {Po{xi), . . . , PAiixi)), i = 1, . . . ,V. The values xi, ...,xv are made public. The keys can be 
given to the nodes at the same time as they are given their local encoding vectors. 

3. Authentication tag: Let us assume that the source wants to send n data messages Si, . . . , s„ £ F^. 

The source computes the following polynomial 

( Af-l) 

As,{x) - Po{x) + s,Pi{x) + sjP2{x) . . . + 4 Pm{x) e F,, [x] 

which forms the authentication tag of each s^, i — 1, . . . ,n. The packets to be actually sent by the 
source are of the form 

X, = eFi+'+'=', z=l,...,n. 

The tag is attached after the message, and 1 bit is added at the beginning, which will be used to keep 
track of the network coding coefficients. 

The number M + 1 of polynomials Pi (x) is related to the number of usages of the key, while the degree 
k — 1 corresponds to the size of attackers coalition. 

Note that while making public the values Xi, . . . , xy still may help an attacker, we prefer to make them 
public and prove that actually this does not help the attacker, in order to minimize the amount of secret 
information given to the nodes. 



3.2 Verification and correctness of the authentication tag 

In order to discuss the authentication check, let us recall from ([T]) what is the received tagged vector at a 
node Ri with ih incoming edges when the source is sending Xj = [1, sj, As- (x)] £ F^+'+'"'', j — 1, . . . ,n: 

( y(eii) \ / .9i(en) ... ffn(ejj \ / 1 Si As^{x) \ 



Recall that a verifying node Ri further has a private key given by 

Pa{xi), . . . ,PM{xi). 
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For each incoming edge e^, k — ii, . . . the node Ri can thus compute the product of the received data 
on the edge by the private keys, as follows: 



-Po(a;»)^5j(efc), Pi{xi)^gj{ek)sj 



and similarly for the key P2{xi) 



\ 9 

^ \ n n 



and the other keys Pj(efe), j = 3, . . . , Af . For example: 



„(M-1) 



3=1 J j=i i=i 

On the other hand, it can evaluate the polynomial 

n 

^gj{ek)As^{x) 



in Xi, which is public. This yields 



'^gj{ek)As^{xi) = '^gj{ek){Poixi) + SjPi{x^) + s''^P2{x^) + . . . + sf ^ PRiixi)) 
j=i i=i 

n n n 

= ^93((^k)Po{xt) + ^gj{ek)sjPi{x^) + . . . + ^gj{ek)s''^ Pm {x^). 
i=i i=i j=i 

The node Ri accepts a packet on its incoming edge e/c if the two computations coincide, which we have just 
shown they do if there is no alteration of the protocol. Note that the verifying node does not need to decode 
the message (which it may not be able to do) in order to perform the check. 

Example 2 Consider the network of Example [T] with a (2, 2, 3) authentication scheme, where we have V — 2 
nodes which verify the authentication tags, say the relay node i?i and the destination and the key can be 
used 3 times, to protect against a coalition of at most 2 attackers (either only R2, or R2 and Ri if the latter 
gets corrupted though it has a private key). The source 5 wants to send two messages si,S2 G F23 ~ Fj, 
that is si — (si,i, si^2, 51,3) G and §2 — (s2,1j ■S2,2, 52,3) G ^2 '^i^^ Sij G F2 = {0,1}. During the key 
generation and distribution, we have that: 

• The source is given the Af + 1 = 3 polynomials ^0(2;) — ooo + o,oix^ Piix) = aio + anx, and ^2(2^) = 
^20 + 0212;, of degree fc — 1 = 1, with coefficients aij in F23. 

• The values a:i, a;2 G F23 are made public. 

• The relay node Ri receives the secret values Po{xi), Pi{xi), ^2(2^1) as its private key. 

• The destination node Di receives the secret values Po(2;2), Pi (2:2), P2(2;2) as its private key. 
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parameters 


notation 


symb / item 


total in ¥q 


source private keys 
public values 
verifiers' private keys 
tags 


P,{x),t^O,...,M 
Xi,i = l,...,V 
i = 0,...,M 
As.{x), i = 1, . . . ,n 


k 
1 

M +1 
kl 


k{M + 1)1 
VI 

V{M + l)l 
nkl 



Table 1: Sizes for the keys and tags of the proposed {k, V, M) scheme. 



The source computes two authentication tags: 

A,,{x) = Po{x) + siPi{x) + slP2{x) 

(aoo + aiosi + a2osl) + x{aoi + ausi + a2isl) 

= : bio + xbii, 5io,6ii £ Fg 
^2(2;) = Pa{x) + S2Pl{x) + slP2{x) 

= (aoo + aioS2 + 02052) + a;(aoi + aiiS2 + 02152) 

= : 620 + 2:621, 620,621 e Fg. 

The two packets to be sent are 

Xl = [1, Si, As^ (x)] = [1, Si,i, Si,2, Sl,3, 610, 611] G (F2)^° 
X2 = [1,52,^^2(2;)] = [1,52,1,52,2,52,3,620,621] e (F2)"^''. 

The first node i?i has two input edges ei, 62, and its received vector is given by 
( y(ei) \ ^ [ 51 (ei) 52(ei) \ f I 5i A,,{x) \ 

V y(e2) J \ 51(62) 52(62) J \ l 52 As,{x) J 

^ f 51(61) +52(61) 51(61)51+52(61)52 gi{ei)As,{x) + g2{ei)As2{x) \ 
V 51(62) +52(62) 51(62)51+52(62)52 gi(e2)As,(x) +52(62)^^2(2;) / ■ 

The data which is public is xi,a;2. Using xi and its private key (^0(2^1), Pi(a;i), ^2(2^1)), Ri can compute 
from y(ei) the following three terms: 

-Po(2;i)(5i(ei) +52(61)), Pi(2;i)(5i(6i)5i +52(61)52), 

and 

i'2(2;i)(5i (61)51 +32(61)52)' = P2(2:i)(5i(6i)'52 + (72(61)^5^) = P2(2:i)(5i(6i)5? + .92(61)52) 
whose sum gives 

Po(2:i)(5i (61) +52(61)) + Pi (a:i)(gi(ei)5i +32(61)52) +P2(xi)(<7i(ei)5? +32(61)52). (2) 

Since Ri has also received gi{ei)As-^ {x)+g2{ei)As2 (x), it can evaluate the polynomial in xi and check whether 
3i(ei)Asi (xi) + 52(61)^82(2:1) is equal to the sum If yes, the node Pi accepts the authentication tag 
and re-encode the packet, otherwise, the packet is discarded. A similar check is performed on 62, and by the 
destination on its incoming edges using its own private key. 

3.3 Parameters and efficiency 

We discuss the efficiency of the proposed scheme, based on the communication, computation, and storage 
costs. The different parameters involved are summarized in Table [1] 

There are two classes of parameters, those fixed by the network, namely, the number T of destination 
nodes, the network code alphabet F^, the length / of the data packets, and n the number of messages to be 
sent by the source. We then have the security parameters fc, V and M, which first depend on the network 
parameters: 
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• Constraints on V: We will typically take V >> T, which means that more nodes than just the 
destinations will check the authentication tags. We could imagine y < T if we do not even want all the 
destinations to check the authentication of their packets. However our goal is to have enough nodes 
in the network (though not necessarily all of them) verifying the integrity of the packets to avoid the 
propagation of polluted packets. We further have 5' > V, since private verification keys are obtained 
by evaluating the polynomials in Xi, i = 1, . . . ,V . li V > q\ then we are forced to use some values of 
¥q more than once, and the private keys are not unique anymore. Thus q' > V >> T. 

• Constraints on M: We assume that M is at least greater than n, to be able to protect with the same 
key all the messages to be sent within one encoding round. 

The scheme communication cost mainly relies on the size of the authentication tag {As^ |, i — 1, . . . ,n, 
which is 0{kl), since the length of the tag is kl, and we also have to consider the augmentation of the data 
vectors by one symbol element performed at the source. 

The computational costs involve computing and appending the tag at the source, and verifying the tag 
at some intermediate nodes and at the destinations. 

• Cost at the source: For creating a tag based on a message Si G F^; , recall that the source computes 
the following polynomial: 

(M-l) 

As, (x) = Po{x) + s^Pi{x) + sf F2(a;) . . . + sf Pm{x) E F,, [x], 

which involves thus M — 1 exponentiations in F^i to compute sf , j = 1, . . . , M ~ 1, and then kM 

multiplications in F^i to get Pj{x)sl , j = 1, . . . , M. This is repeated for each of the n messages s^, 
J = 1, . . . , n. 

• Cost at the verifying nodes: A verifying node Ri needs to do two things to check the tag. First, it 
computes 

n n I n 

Po{xi)^9o{ek), Pi{x,)^gj{ek)sj,..., Pnixi) ^5j(efc)sj 
i=i j=i \i=i 

which takes M — 1 exponentiations in F^i and M + 1 multiplications in F^i , before evaluating the 
polynomial arrived on its incoming edge 

n 

^gj{ek)As^{x) e ¥gi[x] 

in the public key Xi 6 F^i. Since the polynomial is of degree fc — 1, its evaluation requires fc — 2 
exponentiations in ¥q for x^, j = 2, . . . ,k — 1, and fc — 1 multiplications in F^i to multiply each xf, 
j = l,...,fc — 1, with the coefficients of the polynomial. This is done for each of the incoming edges. 

Finally, the storage cost consists of the size of the keys, that is M + 1 keys of size k for the source, and 
the M + 1 polynomials evaluated in one value of F^i , yielding M + 1 values in F^i for each of the verifying 
nodes. 

All the costs of the proposed scheme are summarized in Table [2j 

4 Security Analysis of the Authentication Scheme 

Threats are coming from either outside or inside opponents, who can attempt either impersonation or 
substitution attacks. Outside opponents are assumed to be able to see the data on the incoming edges of 
some of the intermediate nodes. Inside opponents of course see the messages transiting through them, but 
the difference is that some of them may actually be verifying nodes, and thus they can use their own private 
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Tag or signature size 


kl 


Communication cost 


kl + 1 


Tag or signature 

computational cost 


n{M - 1)1 exp 

nkMl mult 


Verification 
computational cost 


((M -l) + k- 2)lh exp 
((M + 1) + A; - l)lh mult 


Storage at the source 
Storage at the verifiers 


(M + l)lk 
{M + l)l 



Table 2: Efficiency of the proposed scheme. The parameter h denotes the number of incoming edges of a 
verifying node. Operations (multiplications and exponentiations) as well as numbers of symbols are in ¥g. 



keys to forge a substitution attack. The analysis focuses on the worst case scenario, namely a coalition of 
inside malicious nodes in possession of private keys is trying to make a substitution attack, that is, to send a 
fake packet after observing tagged messages in such a way that a node which checks for authentication will 
actually accept the faked authentication tag. 

4.1 Preliminaries 

In the following, we may write as matrix indices the dimension of the matrices for clarity. 
Suppose that a malicious node has ih incoming edges, with received vector 

/ y(eii) \ / 9i{eh) ■■■ 9n{en) \ / 1 si As^ix) \ 

\ y(eij / \ .9i(e»J ••• 3n(e»J / V 1 s„ A^{x) J 

Ej=i 9j (e*i ) Ej=i 9j (ezi )sj 9j (ezi {x) 

E"=ift(eih) E"=i%(eijsj E"=i%(eiJ^,(a;) 
from which it tries to learn about the source private keys. If we write 

M-l 

As,{x) = Po{x) + .SjPi(x) + ... + s] Pm{x) 
= bjo + bjix + ... + bj,k-ix''~^ e Fgi [x], 

we have that for all incoming edges 

n n 

^9j{em)As^{,x) = ^gj{em){bjo+bjix + ... + bj^k-ix''^^) 



where 

n 

^mi ~ ^ ^ 9j {^rn)bji G IFgi . 
J = l 

Thus, the malicious node actually knows Cmi, i = 1, . . . , A; — 1, for every incoming edge e^, m = ii, . . . ,ih, 
and upon reception of its incoming vector, it can obtain the following system of linear equations: 

^/cx(M+l)G'(M+l)x/t = C'fcx^i- (3) 
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Both the matrix G containing the network coding coefficients and the matrix C respectively given by 



C 



cio 



Cl,fc-1 



C/i.O 



C/i,fc-l 



G 



E?=i5j(en 



3 

(M-l) 



are known to the malicious node, while the k x (M + 1) matrix A given by 



(M-l) 



( ao,Q 
ao,i 



ai,o 
ai,i 



aM,Q \ 

O-MA 



aM,k-l / 

. , of the ith secret polynomial Pi, and 



is to be found. A has on its ith column the coefficients ato, ■ 
thus contains all the coefficients of the source's private keys. 

Let us now assume that K nodes collaborate to make a substitution attack. Each of them first obtains 
vectors of data from the network, and can thus collect a system of linear equations of the form 

AG, = i^l,...,K, 

as explained in ([3]). The number of columns of Gi depends on the number of incoming edges hi at the ith 
corrupted node. All together, this gives a new system of linear equations of the form 



^fcx(Af+l)^(Af+l)x(/u + ...+/i.A-) - Ckx{hi + ...+hK) 



with 



G = [Gi G2 ■ ■ ■ Ga'], C = [Gi, 



,G 



where all matrices C and A have coefficients in F^i . 

We now take into account that some of the nodes who are given the private keys to check the authen- 
tication could be corrupted. Since we assume a group of K malicious nodes, let us furthermore assume the 
worst case, namely that all of them actually possess a private key (Po(2^i), . . . , PAi{xi)), where i belongs to 
a subset of cardinality K oi {1, . . . , V}. Without loss of generality we can assume that i goes from 1 to K. 

Since the values xi, . . . , xy are made public, the group of adversaries can actually build another system 
of linear equations which exploits their knowledge of the private keys, namely 



XKxkAkx{M+l) 



P. 



where 



X = 



Xi 
X2 



Kx{M+l) 



\ I XK 



„fc-l 



contains the public key values, as before 



A = 



I ao,Q 
ao,i 



ai,o 
ail 



aM,o \ 



12 



contains the coefficients of the private key to be found by the group of attackers, and 

/ Po{xi) Pi(xi) ... Pm{xi) \ 

P0{X2) Pl{x2) ... Pm{x2) 
P= . . . , 

\ Po{xk) Pi{xk) ... Pm{xk) J 

contains the private keys of the corrupted nodes. 

Since the polynomials Pq, . . . , Pm have degree fc — 1, it is clear that K can be at most fc — 1, otherwise 
from the knowledge of only the private and public keys, the group of attackers can recover the source's 
private key, i.e., they can solve the system of equations and recover A. 

By putting together the information given by the private keys and the one gathered from all the received 
vectors, the group of adversaries has now the knowledge of the following linear systems of equations for 
trying to find the source private key: 

^kx{M+l)Q{M+l)xH =CkxH, -^if xfe^fex(M+l) = -Pfs:x(M+1)j 

where H = hi + . . . + hx is the aggregated number of incoming edges for all corrupted nodes and K < k — 1. 



4.2 Main analysis 

Let us start this part by proving some technical lemmas. 

Lemma 1 Consider the finite field ¥qi and the polynomial F{x, y) in ¥qi [x, y\ given by 

F{x,y) = {x - ai) . . .{x - aQ){y - . . . (y - [3r) 
of degree Q in x and R in y. Then there exists a {Q + 1) x (i? + 1) matrix A such that 



(I ai ... a'^ \ 
1 a2 ... a? 



^ = Ogx(ii+i) and A 



( 1 1 



1 \ 

Pr 







(Q+l)xr> 



V /3f P^ ... P'^ ) 



V 1 a, ... oq ) 

forl<q<Q andl<r<R. 

Proof. Let us develop the products in x and y of F{x, y) = [x — a\) . . .{x — aQ){y — (3i) . . .{y — Pr) 
respectively to get 

a{x) = {x — ai) ... (a; — aq) = oq + aix + . . . + uqx''^ 

and 

b{y) = {y - Pi) . . .{y - Pr) =bo + biy + . . . + bRy"^. 

Now we can write 



F{x, y) = a{x)b{y) = {l,x, . . . ,x'^) 



f 0,0 \ 

ai 

V J 



{bo,bi,...,bR) 



y 



for the matrix A with coefficients in ¥qi. Since F{aq, y) = for 1 < q < Q, we have that 



/ 1 \ 

y 







F{ag,y) = (l,aq,...,a^)A 
for all y which proves the first equality. The claim follows similarly by using that F{x, Pr) = for 1 < r < i?. 
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Example 3 Take 



F{x,y) 



We have that 



F{x,y) = {l,x) 



{x - ai){y - f3i){y - 132) 
{x-ai){y^+y{-l3i-p2)+Pip2). 



7' j (/3i/32,-/3i-/?2,l) y 

-aiPiP2 Q:i(/3i+/32) -ai 
/3i/32 -{P1+P2) 1 



Thus 



F{ai,y) = {l,ai)A \ y ] = 



(l,ai)^ = (0,0,0). 



and 

Lemma 2 Consider the finite field ¥qr . 

1. Let 

h{y) = bo + hy + 62^^ + • • • + bgy" + ... + 6gM-iy«" ' 

be a polynomial in Fg/ [y] . If all the coefficients bi are zero, but for the M + 1 coefficients bo and bqj , 
J = 0, . . . , M — 1 which can take any values in ¥qi , then for all choices 0/71, ... , in ^q' , there exists 
a polynomial c{y) S Fg' [y\ of degree q^^^ ~ H such that 

b{y) = (y - 71) • • • (y - iH)c{y) 

provided that H < M. 

2. Consider the polynomial F{x,y) in F,/ [a;, y] given by 

F{x, y) = (.T - ai) • ■ • (a; - aQ)b{y) 

of degree Q in x and where b{y) = 60 + biy + 622/* + . • . + ^My' is as above, in particular it is of 
degree q'^~^ and has 71, . . . ,7ij € F,/ as roots. Then there exists a {Q + 1) x (M + 1) matrix A such 
that 



/ 1 ai . 

1 a2 ... CX2 

V 1 a, ... / 



A = Ogx(M+i) and A 



1 


1 


1 


\ 


71 


72 


7ff 




7? 


7l • 






M-l 


M-l 


M-l 




7i 


72 




) 







(Q+l)xJf, 



fori <q<Q and 1<H <M. 
Proof. 
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1. Consider the polynomial 

b{y) =bo + biy + b2y^ + ... + b^y" + ... + 6,M-iy«""' 

where all the coefficients bi are zero, but for the M + 1 coefficients bo and b^j , j = 0, . . . , M — 1 which 
can take any values in Fg/ . For all choices of 71 , . . . , 7h in Fg/ , we can form the polynomial d{y) by 
defining 

diy) = {y- ii){y - 72) ■ • ■ (2/ - ih)- 

What we claim is that, provided that H < M, there exists a polynomial c{y) g ¥gi [y] such that 

b{y) = (2/ - 71) • • • (2/ - lH)c{y) = d{y)c{y), 
or in other words, we can choose c{y) such that 

iv - 71) ■ ■ ■ {y - iHHy) 

is a polynomial whose coefficients are all zero but for M + 1 of them, which are the constant term and 
the qhh term, for j = 0, . . . , M - 1. 

To prove this, let us write d{y) as 

d{y) = do + diy + d2y'^ + . . . + dny" ■ 

The equation d{y)c{y) = b{y) can be rewritten, by identifying the coefficients of y, as 

/ do \ 
di do 



di 




/ CO \ 


( bo 


\ 


do 


Cl 


bi 




du 




\ CqM-l_H ) 


y bqM-l 


J 



V 

D of size (q"-i+l)x(gM-i-H+l) 

Among the q^~^ + 1 coefficients bi, wc do not have any constraint on the constant term and the qHh 
term j = 0, . . . , M — 1, which can take any value. We only have as constraints that the other coefficients 
are zero. We thus care about q'^~^ + 1 — (M + 1) = q^~^ — M of them, which means we can remove 
M + 1 rows from both sides of the above system of equations. The matrix D containing the coefficients 
di is now a {q^~^ — M) x {q^~^ —H + 1) matrix. Any wanted polynomial c{y) corresponds to a vector 
(co, . . . , CgM-i_fj) which belongs to the kernel of D. For this vector to exist and be non-zero, we need 
the kernel of D to be of dimension at least 1, for which the rank rk(i?) of D must be smaller or equal 
to - H. Now we have that 

rk{D) < min(5^-i - M, q'^'^ -H+1). 

Thus if < M as assumed, we get that 

rk{D) < min(g^-i - M, g^-^ -H + l)= q^'^ -M < q'^''^ - H 

and we are done. 
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2. As in the proof of Lcnima[Tl we first develop the product in x from F{x, y) = {x — ai) • • • (x — aq)h{y) 
to get 

a{x) — {x — ai) . . .{x — aq) = ao + aix + . . . + aqx^ . 

Since b{y) is given by 



Kv) = (y - 7i) ■ ■ • (2/ - lH)c(y) ^bo + biy + 622/'' ■ ■ • + bMV'^ 



we can write 



Fix,y)^{l,x,...,x'^) 



f ao \ 

fli 



(60,^1,^2, ■ • ■ ,&A/) 



/ 1 \ 

y 



for the matrix A with coefficients in Fg/. Since F{aq, y) = for 1 < q < Q, we have that 



F(ag,y) = {l,ag,...,a'^)A 



( 1 \ 

y 
y" 



for all y which proves the first equality. The claim follows similarly by using that F{x,jh) — for 
1 < H < M, by the previous point of the lemma. In words, the number of rows and columns of 
the matrix A are decided by the number of (non-zero) coefficients in the polynomial a(x) and b{y) 
respectively. On the other hand, the number of rows of the matrices with coefficients in 7 and in a 
depends on the number of roots of the respective polynomials. 



Example 4 • Take first q — 2 and A4 — 3. For any choice of 71, 72, 73, we can define 

d{y) = (y - 7i)(y - 72)(2/ - 73)- 

Now since 

b{y) ^bo + biy + b2y^ + b^y'^, 
this means that we are looking for a linear polynomial 

c{y) = y -li- 
lt is easy to see here that we can choose 74 = 71 + 72 + 73- 
• Take q = 2 and M — 4. We have for any choice of 71, 72, 73, 74 that 

d{y) = (y - 7i)(y - 72)(y - 73)(2/ - 74) 

= do + diy + ^22/^ + day^ + d^y^ 



with 



do = 71727374 

di = -717273 - 717274 - 717374 - 727374 

d2 = 7172 + 71 73 + 7273 + 7174 + 7274 + 7374 

^3 = -71 - 72 - 73 - 74 

d4 = 1- 
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The polynomial b{y) is given by 



b{y) =ba + biy + b^y^ + b^^y^ + b^y^ 



and in order to find a polynomial c{y) — cq + ciy + C2y^ + csy^ + c^y* + c^y^ such that c(y)d{y) — b{y), 
we have to solve the following system of equations: 



/ ^3 d2 di do 
d^ c?3 d2 
di ds 





( \ 






Cl 


di 




C2 


d2 




C3 


da J 




C4 




\ C5 y 






V ) 



Clearly the dimension of the kernel is at least 1. 
Lemma 3 If K < k — 1 and H < M . there exist matrices v4j.x(M+i) with eoefficients in F^i such that 

Akx{M+l)G{M+l)xH = OkxH, ^KxkA = Okx(M+1)- 

Proof. Let A = ^fex(M+i) be a solution to the above system of equations. Then the matrices rA obtained 
by multiplication with a scalar r £ F^i are also clearly solutions, and it is thus enough to show that there 
exists one suitable matrix A. 

To prove that such a matrix exists, we use Lemma [2l for which we will exhibit a suitable bivariate 
polynomial F(x,y) — a{x)b(y). 

Let us start by looking at the second equation. For any choice of K public keys Xi,. . . ,Xk in F^i, take 
the polynomial a{x) = {x — xi) . . .{x — xk) = ao + o-ix + . . . + gkx^ ■ It is of degree K and has for roots 
xi, . . . , Xk- Thus 

/ ao \ 

ai 

0-2 



a{x) — {l,x, . . . ,x ) 



for x = xi, . . . , Xk- 



\aK ) 



We now consider the first equation AQ = 0. We start by rewriting it in a different form. Recall that the 
matrix Q is of the form 



/ E^iSi(eii) 
E"=i5'j(eijsi 



Note that for any invertible matrix P, we have that 

Ag = o ^ 



= l 9j {^iH)^j 



Agv = 0, 



and there exists an invertible matrix T) such that QT) is of the Vandermonde like form 



/ 1 



1 \ 



71 



1% 



(4) 



V it 



Th I 
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Indeed, if all the coefficients of the first row of G are non zero, we can take V to be 



in which case we have 



7fe 



where the denominator is in Fg since it only depends on the network coding coefficients. If the zth coefficient 
(say the first for example) of the first row is zero, then we can first compute GS with 

10 
5= ( 1 

Ih-2 



which yields 



T,j=l93ie^H)Sj J 



Now take 

n n n 

and V = 51?' to finally obtain 



71 



7fe = ^ — ^, k> 2. 



Ej=i5j(e»2) 
Thus we can assume that we look at 

AG = 

with G of the form ([4]). 

Now for all choices of 71, ... , 'Jh, consider the polynomial b(y) in F^i [y] such that fo(7i) ~ 0, i ^ 1, . . . , H, 

but also such that b{y) = 60 + + + . . . + hMy^'^" Such polynomial exists by the first part of 
Lemma [21 as long as H < M . Thus 



6(2/) = (l,y,2/^...,2/' " ) 



/ ^0 \ 

hi 
h2 



= for y = 'fi, ... ,-fH. 



We can finally write 



F{x,y) = (l,x,...,a:^) 



fli 

V OK y 



(60,^1,^2, ■ • ■ ,&m) 



/ 1 \ 
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for the {K + 1) X (M + 1) matrix A with coefficients in ¥qi. By the second part of Lemma A satisfies 



f 1 xi 

1 X2 



\ 1 a;^ 



„K 



4 J 



A^O 



Kx{M+l) 



and 



/ 1 



A 



1 \ 



71 
7? 



72 

ll 



Ih 
ll 



\lf 



ll 



= 



{K+l)x{M+l)- 



Ih 



J 



Since the matrix X build by the adversary has k columns, we require K + 1 ^ k, that is -ftT = fc — 1. This is 
indeed the assumption that we made on K and k in the hypothesis, and this can be interpreted by the fact 
that if K > k, then the adversaries can find the source's secret just from the matrix X. This concludes the 
proof. 



We are now ready to state the security of the proposed authentication scheme. 

Proposition 1 Consider a multicast network implementing linear network coding, among which nodes V 
of them are verifying nodes owning a private key for authentication. The above scheme is a (k, V, M) 
unconditionally secure network coding authentication code against a coalition of up to k — I adversaries, 
possibly among the verifying nodes, in which every key can be used to authenticate up to M messages, under 
the assumption that H < M , where H is the sum of the incoming edges at each adversary. 

Proof. To make a substitution attack, the malicious k — 1 verifying nodes want to generate a message such 
that it is accepted as authentic by any honest verifying node Ri that they are trying to cheat. However, for 
that, they need to guess its secret key [Po{xi), . . . , Pj^i{xi)], and choose a polynomial As{x) such that 



As{x,) = Poix,) + s'iPiix,) + ... + 8" 



Pnixi) 



for some message s. Gathering all they know after watching one transmission of tagged messages, the 
coalition of adversaries get the following system of equations: 



Akx{M+l)Q{M+l)-x H —CkxH, XxxkA- 



Pkx(M+1)- 



If there is no matrix satisfying this system, the information gathered by the adversaries is not 

useful. Now if such a matrix ^^^(Af+i) indeed exist, then there are actually of them satisfying these 
equations, given by 

Akx(M+l) +^fex(M+l); 

is a solution of the corresponding homogeneous system of equations, and Lemma [3] 



where A' = A 



fex(M+l) 

tells us that there are g' such A' . Thus there are different (M + l)-tuple of polynomials (Po(a^)j ■ • • , Pm{x)) 
likely to be the source's private key, from which that there are equally likely private keys for Ri. Thus 
the probability of the fc — 1 receivers to guess A{xi) correctly is l/g'. 
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Example 5 Let us go on with Example [21 The node i?i has received the vector 



with 



and 



y(ei) 

y(e2) 



y(ei) = (5i(ei) +52(ei),5i(ei)si + 32(61)52, gi(ei)^si (x) + g2iei)AsJx)) 



y(e2) = (31(62) +52(62), 51 (e2)si +32(62)52,51(62)^*2(2:^) +52(62)^82 (2;))- 



But this time, let us assume that the node Ri is malicious, and instead of checking the authentication tag, 
it actually wants to make a substitution attack. 
Since we have that 



AsAx) 



Poix) + siPiix) + slP2{x) 
(aoo + aiosi + a2osl) + x{aoi + auSi + a2isl) 
bio + xbii 

Po{x) + S2Pl{x) + 4P2{x) 

(aoo + aioS2 + 02052) + a;(aoi + aiiS2 + 02152) 
ho + xb2i, 



we can rewrite 

5l(6l)^si {x) +52(61)^*2(2;) 

The malicious node thus knows 

6x0 = 51(61)^10 - 
Alternatively, we can rewrite 



5i(6i)(^'io + a;6ii) +52(ei)(620 +2;^2i) 
51(61)^10 + 52(61)620 + a:;[5i (61)611 + 52(61)621] 



52(61)620, cii = 51(61)611 +52(61)621. 



51(61)^*1(2;) +52(61)^*2(2;) 
= 5i(6i)(aoo + aioSi + a2os?) + .g2(6i)(aoo + aioS2 + a2os^) 

+a::5i(ei)(aoi + anSi + a2iSi) + 2;52(ei)(aoi + aiiS2 + a2isl) 
= aoo(5i(ei) + 52(61)) + aio(5i(ei)si + 52(61)52) + 020(51(61)51 + 52(61)33) 

+x[aoi(5i(6i) +52(61)) + aii(5i(ei)si +52(61)52) +021(51(61)5^ +52(61)53)]. 

Since the malicious node knows 51(61) + 52(61), 5i(6i)si +52(61)53 and 51(61)5^ + 52(61)53, and by iterating 
the computations for the second incoming edge, it can form the following system of linear equations: 



oo,o oi,o 
oo,i Oi,i 



02,0 

024 



G 



ClO 63^0 
Cl.l 631 



where 



G = 



51(61) + 52(61) 51(62) + 52(62) 
5i(ei)5i + 52(61)52 51(62)51 + 52(62)52 
5i(ei)5i + 52(61)5^ 51(62)5? + 52(62)5^ 



If i?i is not a verifying node, it should prepare an attack based on the knowledge of this system of equations. 
We can illustrate the condition H < M required for security. Suppose that it were not the case, that is 
H = 2 but we have only M = 1, meaning that only two polynomials Pq and Pi are used to create the 
authentication tag, then the matrix G would be a 2 x 2 matrix, and thus could be very likely invertible, thus 
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allowing the malicious node to recover the secret coefficients of the source private key, although the node 
cannot decode the message. 

Now if furthermore i?i has a private key [Po(a;i), Pi{xi), ^2(2^1)], it further knows that 

\ ao,i ai,i 024 J 

Let us assume for this example that the first row of G has non-zero coefhcients, so that both coefficients are 
invertible. We set 

71 = (5i(ei)si +52(ei)s2)(.gi(ei) +.g2(ei))"^ 

72 = (51(62)51 +ff2(e2)s2)(gi(e2) +52(62))"^ 



and we can rewrite G as 



51 (ei) + 32(61) 51(62) + 32(62) 

71(51(61) +32(61)) 72(51(62) +52(62)) 1 = 
7?(5i (61) +52(61)) 71(51(62) +52(62)) 

' 5i(6i) +52(61) 

51(62) +52(62) 



7i 72 
7? 7l 



It is a straightforward computation to check that the matrices 

rA^rf ^171+^172 -x, \ j^a 

\ 7172 -71 - 72 1 / 

satisfy the system of equations AG = 0, XA = 0, where X — (1, xi). 



5 Multicast Goodput Analysis 

In this section, we discuss the performance of our scheme in terms of multicast throughput and multicast 
goodput. The multicast goodput is analyzed to assess the impact of pollution attacks in network coding 
systems and to show how much the multicast throughput is degraded under such attacks. 

The analysis starts with definitions of multicast throughput and multicast goodput. We then derive 
their characterizations in our setting, depending on whether the proposed authentication scheme is used. 
We provide three exemplary topologies with various numbers of intermediate nodes, shown in Figure [2l to 
illustrate the multicast throughput gains obtained using our scheme. 

5.1 Definitions 

Recall that we have a single source S, sending n messages to T destination nodes Di, . . . , Dt, while V will 
denote the set of V receivers Ri, . . . , Rv that can verify the authentication tags. The intermediate nodes may 
or may not have been corrupted by malicious messages. We will denote by TZc a set of intermediate nodes 
with corrupted messages in their incoming buffers and by TZg a set of intermediate nodes with "good" (i.e. 
non-corrupted) messages in their incoming buffers, with cardinality respectively \Ti-c\ = Vc and \TZg\ = Vg. 

We consider a single multicast session s{S,n,TZ,'D,rc) where the source node S delivers n messages to 
all nodes in a destination set V C {Di, . . . , Dt} through multi-hop paths in a set TZ of intermediate nodes 
containing corrupted nodes. 

We define the following performance metrics: 

• The message rate of a multicast session s{S,n,TZ,'D,rc) is termed the multicast throughput, and is 
denoted by Rsv- 
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• The rate of messages successfully delivered to each destination per session s is termed throughput per 
destination and is denoted by RsDi for the destination Di. 

• The rate of non-corrupted messages of a multicast session s{S^n,TZ^'D,rc) is termed the multicast 
goodput. It is denoted by Gsv if our scheme is used, and by Ggp otherwise. 

• The rate of non-corrupted messages delivered to each destination per session s is termed the goodput 
per destination and is denoted by GsDt for the destination Di if our scheme is used, and by Cgj-,. 
otherwise. 



5.2 Multicast goodput analysis without the authentication scheme 

Pollution attacks degrade the multicast throughput Rsv of a session with a degradation factor a G [0,1], 
resulting in a multicast goodput of the form: 

The multicast goodput of a session s{S,n,Ti,'D,rc) depends on the topology of the network and is 
expressed by the following expression: 

G'sv = (1 - ^)Rsv (5) 

where n^c is the number of paths corrupted by Tc, i.e., from the corrupted intermediate nodes TZc to the 
destinations in 2?; and is the number of incoming edges in the destination set T). The multicast goodput 
varies depending on the positions of the Tc corrupted intermediate nodes in the network. 

The average multicast goodput of a session 5(5, n, 7?., 2?, r^) over all j positions of the Tc corrupted 
intermediate nodes in the network is expressed by: 

sv ~ IDj 

where A is the combination of Vr over r: A = CI" = — rr-^ — tt. 



5.3 Multicast goodput analysis with the authentication scheme 

With our authentication tags, if 7^ C V, intermediate nodes in the network can then verify the integrity 
and origin of the messages received without having to decode. They can detect and discard the corrupted 
messages in-transit that fail the verification. 

The corrupted messages are discarded at their entrance in the network, and therefore do not propagate 
in the network towards the destinations. The multicast goodput is thus not degraded (a = 1), and equal to 
the multicast throughput: 

GsT) = Rsv- (J) 
The average multicast goodput gain offered by our scheme is expressed as follows: 



Gain = G~sv - G'gj, (8) 
= Rsv - G'gj, (9) 

where G'gj, is the average multicast goodput obtained without the use of our scheme. 

Let us now present a few examples based on different topologies. 
Topology a) . In Figure [H we consider the topology a) with various configurations TZc (this is also the 
topology discussed in Example [T]) . 
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Figure 2: Examples of network topologies. 
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Table 3: Multicast Goodput results for Topology a) 

• If n = 3, = 1 and our scheme is not used, we obtain: 

1 2 

• If n = 3, Tc = 2 and our scheme is not used, then Rg — and : 

G'sv — 0- 

Topology b). In Figure [2l we consider the topology b) with again various configurations of TZc- 

If n = 2, Tc = 1 and our scheme is not used, we have two possibilities for the intermediate receiver that 
holds corrupted packets: 

• If the intermediate receiver with corrupted messages is on the first hop from the source (i.e., -R2), 
then 

• If the intermediate receiver with corrupted messages is on the second hop from the source (i.e., R3), 
then ^ 

G'sv = -^Rsv = -j^Rsv- 
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Table 4: Multicast Goodput results for Topology b) 
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Table 5: Multicast Goodput results for Topology c) 

If Tc = 2 and our scheme is not used, then we have = 1, and there are two possibilities again: 

• If the intermediate receivers with corrupted messages are on the first hop from the source (i.e., 
R2), then 

GsD, — 0; Gg-p — 0. 

• If one intermediate receiver with corrupted messages is on the second hop from the source (i.e. R3) 
and the other is on the first hop from the source (i.e., R2), then 

G'sv = -^Rsv- 

If Tc = 3 and our scheme is not used, we have = and Gg-p = 0. The multicast goodput results are 
summarized in Table [H 

Topology c). In the topology c), we consider also various configurations of TZc- The multicast goodput 
results are summarized in Table [5] 

If we now consider the goodput gains with our scheme for topologies a), b), c), we get that for all Tc, 
Gsv — Rsv- In the three topologies, our scheme offers multicast goodput gains that are given in Tabled 
As the number of corrupted messages injected increases in the network, the average multicast goodput gain 
naturally tends towards 1. 
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Table 6: Average Goodput Gains obtained with our scheme 



24 



bit 







































































Tag 



iP packet: 1500bytes 



[Message of N iP pacl(ets: / = 1500xN 



Figure 3: Structure of a message 



6 Application to File distribution 

In this section, we present how the proposed {k, V, M) authentication scheme could be easily applied to 
content or file distribution. For content distribution over an IP-based network with our scheme, at most 
M messages forming the file to be distributed can be transmitted by the source through the network in an 
authenticated way using the same key. For our scheme to be secure against a coalition of fc — 1 receivers, we 
recall the following rules: 

• M > n, where n is the number of messages to be sent by the source. 

• M > H , where H is the maximum number of incoming edges in a coalition of malicious nodes. 

We also define N as the size of the generation of IP packets carrying one message authenticated by one 
tag. Figure [3] illustrates the relation between an IP packet and a message. 
In a practical scenario, the following should be considered: 

• a message consists of I symbols Sij with a symbol being bit. 

• one message authenticated by one tag consists of N IP packets (also called a generation). 

• IP packets are 1500 bytes long (12000 bits) with a payload of 1480 bytes. 

• The message length I can be expressed in bits and in bytes. We refer to hits to the message length 
expressed in bits and to hytes to the message length expressed in bytes. 

kytes = 1500 X N 

luts = 8 X hytes = 12000 X N. 

For M < hits, we have M < 12000 x N, which means that the source can use the same key to tag at 
most M = 12000 x N messages of length 12000iV bits (carried over N IP packets that are 12000 bits long). 

Destinations can download a file with at most the following size in bytes (including headers) : M x hytes — 
hits X hytes = 8 X l^^^^^ = 8 X (1500iV)2 = 18 X 10^ X N"^ bytes. The destinations can use the same key to 
authenticate a file download of at most 18A^^ MBytes when one tagged message is carried over N IP packets. 
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File size 


Generation Size 


Message length 


Nb of messages authenticated 


(bytes) 


N 


1 (bytes) 


by the same key M 


18M 


1 


1500 


12 000 


72M 


2 


3000 


24 000 


1.8G 


10 


15K 


120 000 


4.05G 


15 


22. 5K 


180 000 



Table 7: Parameters of our scheme for distribution of files of variable sizes 



A receiver node can have at most 12000 x N incoming edges and the source S can send n < 12000 x N 
messages. 

The scenarios in Table [7] show what should be the size of an IP packet generation to allow the distribution 
of a given file to be authenticated under the same key. 

For distributing a file that is 18MBytes, it is sufficient for the source to send one tagged message in one 
IP packet of 1500bytes . The source sends then 12 000 messages tagged that form the ISMBytes file. Any 
destination can verify with the same key each tag attached to the 12 000 messages. 

For distributing a file that is 1.8GBytes, the source generates tagged messages of size 15KBytes. Each 
message is sent in a generation of 10 IP packets. The source sends 120K messages tagged that form the file. 
At the destination, the same key can be used to verify the tags of the 120K messages received. 

7 Conclusion 

In this paper, we have proposed an unconditionally secure authentication scheme that provides multicast 
linear network coding with message integrity protection and source authentication. The resulting scheme 
offers robustness against pollution attacks from outsiders and from k ~ I insiders. Our solution allows the 
source to generate authentication tags for up to M messages with the same key and the intermediate nodes 
to verify the authentication tags of the packets received and thus to detect and discard the malicious packets 
that fail the verification. The performance analysis showed that our scheme offers goodput gains that tend 
towards 1 with increasing corrupted packets in the network. Our scheme can be used to authenticate with 
the same key a file download of at most 18A^^ MBytes when one tagged message is carried over N IP packets. 

Future work will involve optimization of the parameters involved in the authentication scheme for a more 
efficient solution. Another aspect to consider in the future is to offer more flexibility over the sender as the 
scheme proposed here requires the sender to be designated. 
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